Introduction

Article & Datasets

  • Nature article: Mora et al., 2019: Space Station conditions are selective but do not alter microbial characteristics relevant to human health
  • Two sets of stacked bargraphs (fig 1 & 4), describing microbial diversity
  • Grouped by sampling sessions (A,B,C)
  • No. reads per unique sample, stratified by Taxonomic classification (Domain, Phylum, Genus)

SD1

  • Supplementary data 1 (fig 1): RSV table (Ribosomal Sequence Variants)
RSV ISSCapoA1 ISSCapoA2 ISSCapoA3 ISSCapoA4 ISSCapoA5 ISSCapoA6 ISSCapoA7 ISSCapoA8 ISSCapoA9 ISSCapoC1 ISSCapoC2 ISSCapoC3 ISSCapoC4 ISSCapoC5 ISSCapoC6 ISSCapoC7 ISSCapoB1 ISSCapoB2 ISSCapoB3 ISSCapoB4 ISSCapoB5 ISSCapoB6 ISSCapoB7 ISSCapoB8
RSV_0001 0 0 0 0 0 0 0 0 0 0 10 0 0 160 0 0 0 0 0 31 1080 0 0 0
RSV_0002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9 0 0
RSV_0003 0 0 0 74 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Taxonomic classification
k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Ruminococcaceae;gFaecalibacterium;s
k__Bacteria;p__Cyanobacteria;c__Melaibacteria;oGastraerophilales;f;g;s
k__Bacteria;p__Planctomycetes;c__Phycisphaerae;oWD2101_soil_group;f;g;s

SD2

  • Supplementary data 2 (fig 4): Taxonomic diversity inferred from metagenomic dataset
    domain phylum className order family genus
    Viruses unclassified (derived from Viruses) unclassified (derived from Viruses) Caudovirales Podoviridae AHJD-like viruses
    Bacteria Firmicutes Bacilli Lactobacillales Aerococcaceae Abiotrophia
COLA1 COLB1 N2A N2B N3C1 N1C
11 130 2 25 32 33
17 898 10 98 102 119
0 1 0 0 0 0

Table 1

  • Table 1 (tidy): sampled surfaces, locations, Wide ID and session ID
    Sampled_surface ISS_module Wipe Session
    Ambient air (field blank, FB) Columbus A5 A
    Ambient air (field blank, FB) Columbus A5 B
    Ambient air (field blank, FB) Columbus B1 A
    Ambient air (field blank, FB) Columbus B1 B
    Light covers Columbus A4 A
    Light covers Columbus A4 B
    Light covers Columbus B2 A
    Light covers Columbus B2 B
    SCC laptop Columbus A2 A
    SCC laptop Columbus A2 B

Data Processing Workflow

Results & Discussion

Figure 1: Domain

Figure 1: Phylum

Figure 1: Genus (our plot)

Figure 1: Genus (their plot)

Figure 1 Discussion

  • Phylum: they have an ‘unknown’ section, but we can’t find that in the data
  • Genus:
    • Bars don’t close to 100%
    • Clean room data inaccessible
    • Multiple chunks not defined in legend
    • Bad colour choices: e.g. ‘unknown’ and ‘lactobacillus’ difficult to distinguish

Figure 4: Domain

Figure 4: Phylum

Figure 4: Genus (our plot)

Figure 4: Genus (their plot)

Figure 4 Discussion

  • General: Their use of sqrt(TSS) makes no sense
  • Method of selection unclear
    • Solution: sort by top 200 arrange(desc(count)) %>% top_n(200)
  • Domain: ‘other sequences’ not visible in out plot, but there is only 1 read
    domain n
    Archaea 59
    Bacteria 592
    Eukaryota 390
    other sequences 1
    Viruses 73
  • Genus: we filtered out all non-bacteria

Conclusion

Conclusion

  • This study reveals a problem with data reproducibility

  • We discovered that there are many ways to accomplish the same task

  • Further perspectives & improvements

    • Obtaining the clean room data could completely change SD1
    • Automate TSS with purrr() and SD1 column-renaming by referencing Table 1
    • We tried to recreate the PCoA plot, but the overwhelming amount of ‘zeros’ made it unfeasible